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Abstract 

In this paper we propose a few methods for solving large Lyapunov equations that 
arise in control problems. We consider the common case where the right hand side is 
a small rank matrix. For the single input case, i.e., when the equation considered is of 
the form AX + XA T + bb T = 0, where b is a column vector, we establish the existence 
of approximate solutions of the form X = VCV T where VisJVxm and G is m X m, 
with m small. The first class of methods proposed is based on the use of numerical 
quadrature formulas, such as Gauss- Laguerre formulas, applied to the controllability 
Grammian. The second is based on a projection process of Galerkin type. Numerical 
experiments are presented to test the effectiveness of these methods for large problems. 


1 Introduction 

Many of the matrix equations that arise in control problems can be successfully solved by 
well-known numerical techniques, when the matrices involved are small. In contrast there has 
been very little done to provide numerical methods for solving these same problems when 
the associated matrices are very large. Yet, there are now several applications in control 
which lead to matrix equations involving very large sparse matrices. These typically aiiv 
whenever the model involves a partial differential equation in several space dimensions such 
as when considering large space structures [2] or when a large network model, e.g. electrical 
network, [3] is involved. 

Recently, there has been a few efforts directed towards large scale matrix problems in 
control. In [5] we proposed a method for partial pole placement, which consists of placing a 
few of the poles of the matrix, namely only those that are unstable. 1 he methods proposed 
are based on projecting the problem onto a small invariant subspace of A associated with 
the unstable eigenvalues. 
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1 


In this paper we focus on Lyapunov’s equations. The numerical solution of these equa- 
tions have gained considerable importance in the last few years due to the crucial role that 
they play in the so-called II analysis. The basic problem addressed by this theory is to find 
a low dimensional ODE model that approximates the original dynamical system governed 

by 

x — Ax + Bu (1) 

y — Cx 

Here, A\sNxN,B is N x p and C is q x N. In the situation we consider N may be very 
large while p and q are very small. The closeness of this system from its lower dimensional 
model is measured in terms of the singular values of the matrix A Y where X and Y are 
solutions of the two Lyapunov equations, 

AX + XA r + BB t = 0 (2) 

A t Y + YA + C T C = 0 (3) 

These are termed Ilankel singular values of (1). Typically, one would like to compute some 
of the largest Hankel singular values. In this paper we will provide the tools for computing 
approximations to X and Y. Moreover, these approximations are in such a form that they 
allow easy computation of the largest singular values of XY . 

Throughout, we will assume that the eigenvalues of A have negative real parts. In this 
situation there is an explicit formula for the solution to (2), 

X = r c TA BB T e TAT dr (4) 

Jo 

The above expression is known as the controllability Grammian of (1). 

Much of the theory in this paper will be established by using the above formula, The main 
observation which w'c will prove in the next section, is that there are accurate approximations 
to the system (2) of small rank. Extracting such approximations will be the subject of 
sections 3 and 4. 


2 Low rank solutions to the Lyapunov equation 

The purpose of this section is to identify the types of approximations that will be used in 
this paper. We will restrict ourselves to the case corresponding to the single input situation 
in (1) > i.e., the equation (2) becomes 

AX + X A t + bb T = 0 (5) 

where b is a single column vector, and its explicit solution is given by 

X = r e TA bb T e rAT dr. (6) 

Jo 
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One way to integrate formula (6), would be to start by replacing (6) by its approximation 

X(s) = T e TA bb T e TAT dr (?) 

Jo 

where s is selected so that the error in the approximation is small enough. Note that we 
must integrate in the interval [0,s] a function of the form w(t)w(t) T where w(t) is t he vector 
function w(t) = e tA b. Assume that we approximate w(t ) in the interval [0,s] as ?y (0 ~ 
w m (t) = q m (At)b , where q m is a polynomial of degree m - 1. The resulting approximat ion 

for X(s) is 

X m (s)= [ w m (T)w m {T) T dr ( 8 ) 

Jo 

This approximation can be made arbitrarily accurate by increasing the degree of q m . Thus, 
by choosing a large enough s and and a large enough degree for q , we can make the approx- 
imation X m (s) to X as accurate as desired. Let V m = [b,Ab,...,A m 'b] and q m (t) = 
aQ p ai t -f ... + Then w m (t) can be written as w m (t) — K m 2 m (f), z m (t) — 

[a 0 i and therefore 

X m (s) = V m Qf z m {r)z m {T) T d-r^j V* = V m G m V Z (3) 

where G m is an m x m matrix whose entries can easily be shown to be equal to 

(m) (10) 

9ij i + j- 1 ' 

Throughout the paper we will exploit approximations to the solution X which are of the 

form r nu 

X — V G V T ( 1 1 ) 

where V m = [ui,u 2 ,....,u m ] is a fixed set of vectors and G m is an arbitrary m x m matrix. 
The set of all such matrices is clearly a subspace of the space of N x N matrices. The range 
of any matrix in this subspace is included in the subspa.ee h = span{V m } while its kernel 
contains the orthogonal to K. The subspace of the matrices (11) is in fact uniquely defined 
by the range K of these matrices. Thus, we will denote by Z m (K) the space of all matrices 
of the form (11), or Z m if there is no ambiguity, where V m is a basis of the subspace K. If we 
restrict the matrices to be symmetric, then Z m is of dimension (m(m + l))/2. If one warns 
to deal with the more general situation where the right-hand-side is of a more gencial form 
bv T rather than bb T then G can no longer restricted to be symmetric and the dimension of 
Z m is m 2 . 

What we have just shown above is that there are approximations from Z m (/t m ) where 
I\ m is the Krylov subspace 

K m = span{b, Ab , ..., A m ~ l b} (12) 

that will converge to X as m tends to infinity. Here, we should mention that , rigorously 
speaking, this statement is trivial in the finite dimensional case since there is always an 
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exact solution of the form (11) for m > N, However, ‘convergence’ is meant in the infinite 
dimensional context, in the same way that one speaks of the convergence of the conjugate 
gradient method which is known to be a finite process. 

If we can find a good approximat ion to w(t) in the interval [0,s], by a low degree poly- 
nomial, we should be able to find a good approximation to X from Z m . However, the above 
formulas should not be used as a practical procedure since they are likely to be highly unsta- 
ble. Section 4 presents a process that is mathematically (but not algorithmically) equivalent 
to the procedure outlined above with a specific choice for q m . In the next section we ex- 
ploit well-known numerical quadrature formulas to derive particular subspaces K and their 
corresponding approximations. 


3 Use of numerical quadrature 

The first procedure that we propose for approximating X is based upon evaluating the inte- 
gral (6) by standard numerical quadrature formulas. We wfill describe two such procedures. 
First, assume that s has been chosen and that we can evaluate w[t) at m equally spaced 
points 

f* = (* — 1 ) — ~,t = (13) 

777 — 1 

then an appropriate quadrature formula for evaluating (6) would yield, 

m 

= X>,Mh MU) T (14) 

t = 1 

where the are the quadrature coefficients. In other words, 

X m = W m AW m (15) 

in which W m is the N x m matrix W m = [».»(<,), w(t 2 ), ..., w(f m )] and A = Diag(6i, 

Note that tn(^i) = b. In order to use the formula (14) we also need to compute and save 
i = 2..., m. One way in which this can be done is by solving the system of Ordinary 
Differential Equations 

w = Aw (16) 

u>(0) = 6 (17) 

by any technique and save the vectors w(l t ), i — 1,2, .., rn at the points t,. Alternatively, one 
may compute io(ti+ 1 ) from w(ti) by iv(t,+\) = c AAt 'w(ii), where A = h+i — f;, in a number 
of efficient ways as has been recently suggested in [4]. The problem considered in [4] is to 
approximate the product of the exponential of a matrix times a vector. We will describe one 
such approach in some detail in Section 4. 

Thus, the class of algorithms based on numerical quadrature would begin by choosing a 
quadrature rule and performing the following procedure. 
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Algorithm 1 

(1) Start: Define = £>; 

(2) For i = 2, ...,m do: 


• Compute from 

• Save w(t,) in the i-th column of W r 


It is clear that the actual matrix X m should never be computed explicitly, since it is 
a dense N x N matrix in general. One only needs to store W m and A m . In fact, it is 
worth pointing out that one may not even need to store W m if the original pioblem is 
to approximate Xv where v is some vector, rather than to compute an approximation to 
the matrix X itself. In this situation the approximation x = X m v can be accumulated as 
x ; _j x (8iw(ti) T v)w(ti) every time that a new w[ti ) is generated and one can discard the 

old w(ti)'s. . 

The question that we would like to address next is which integration schemes to use am 
how to implement them. First, a restriction of the scheme is that the weights 8, should be 
positive, since we know that the solution is semi-positive definite matrix. For example, this 
is not satisfied for the open Newton-Cotes formulas but it seems to be true for the closed 
formulas and for the Gauss-Legendre formulas. We have implemented and tested two basic 
classes of quadrature formulas. 

1. Closed Newton-Cotes formulas with 3 points (Simpson’s rule), 5 points, and 7 points. 

2. Gauss-Laguerrc formulas with 9 points and 15 points. 

Newton-Cotes formulas are fairly standard and the corresponding coefficients can be 
found in any elementary numerical analysis textbook. These use equally spaced points 
which may be an undesirable feature in our case because of the exponential beha\ior of the 
function to integrate. At the beginning of the interval w(t) varies far more than for large i. 
Therefore it is natural to expand the intervals of integration as we proceed. One procedure 
we have experimented with is to double the size of the intervals [x;,Xi+k] onm which the 
Newton-Cotes formula is based at every time. For the 7-point Newton-Cotes formula we 
triple the size of the interval. This simple modification does provide better results than 
keeping a fixed interval. 

The Gauss-Laguerre quadrature formula is a highly accurate scheme to evaluate intcgials 
of functions of the form e~ l g(t) on the infinite half line. The basic formula is, 

rOO 

Jo ,=i 

where the t, are the roots of the Laguerre polynomial of degree m and where the u),' s are 
the quadrature weights. For general functions the formula is used as follows, 

Tfl m 

[ g(r)d,T — f c- T (^g(r))dT = ^^g(t t )g(t t ) = ( 1 ^) 

Jo Jo i=1 i=l 
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The roots and the coefficients for our numerical tests have been taken from [1]. 

An aspect which we found to be critical to the performance of the Gauss-Laguerre in- 
tegration is a proper scaling of the variable r. For some problems, such as the one in the 
numerical experiments section, the norm of A can be very large and as a result the vector 
w(t t ) can be tiny for most of the tf s, thus leading to a poor evaluation of the integral (6), 
For example assume that A is symmetric negative definite having Aj = —10 as its eigenvalue 
closest to zero, and consider Gauss Laguerre integration with 15 points. Then for the second 
root t 2 = 1.21559,. all the eigencomponents of w(t 2 ) will be no larger than e“ 12 ' 5559 " ~ 10" 06 . 
For tz the components do not exceed ~ 1.38610 -10 , and beyond tz the be- 

come too small. A remedy is to use a change of variable £ = car to ensure that the value 
of w(t) at the last root t m is not too small or too large. For example, if an estimate of the 
rightmost eigenvalue Aj of A is available, we might choose a so that |e°' <mAl | = tol , where tol 
is some prescribed tolerance, 

4 A Krylov subspace technique 

In this section we present a method that is based on a global approximation to ic(£) in 
[0, +oo). Explicit integration of this approximation will then yields an approximation to X . 
This, as will be seen, is equivalent to a projection process of Galerkin-type. 

The first question that we address is how to approximate e A b for a given vector b. This 
was considered in [4] where polynomial approximation to the exponential was used. The 
approximation to c A b is taken of the form 

c A b « Pm -\(A)b (20) 

where p m _ i is a polynomial of degree m — 1. 

Clearly, one can use other types of approximations, e.g., rational, which are usually more 
accurate. The attraction of polynomials is the fact that they do not require solving linear 
systems. One way in which a good polynomial can be found is by attempting to minimize 
some norm of the error e z — p m -\{z) on a continuum in the complex plane that encloses 
the spectrum of A . For example, Chebyshev approximation can be used. The disadvantage 
of this approach is that it requires some approximation to the spectrum of A. In [4] we 
considered a technique based on Arnoldi’s method which does not require any eigenvalue 
information. This technique is now summarized. 

The approximation (20) to c A b is an element of the Krylov subspace (12). In this approach 
we need to generate an orthonormal basis V m = [tq, i> 2 , u 3 , . , . , v m ] of K m via the well-known 
Arnoldi algorithm starting with v\ = /)/J|6|j 2 . 
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Algorithm 2: Arnoldi 

1. Initialize: Compute v i := fc/IHh- 

2. Iterate: Do j = 1 , 2 , m 

1. Compute w Avj 

2. Compute a set of j coefficients hij so that w := w - £i=i h i} Vi is orthogonal to 
all previous vf s. 

3. Compute h J+ ij = ||ui ||2 and t>j+i = w/hj+xj. 


By construction the above algorithm produces an orthonormal basis V m — • D’mji 

of the Krylov subspace I< m . If we denote the m x m upper Hessen berg matrix consisting of 
the coefficients computed by the algorithm by //,„ we have the relation 

AK m = V m H m 

from which we get H m = V^AV m . Therefore II m represents the projection of the linear 
transformation A onto the subspace A' m , with respect to the basis V m . As is well-known 
in the particular case where A is symmetric, Arnoldi’s algorithm simplifies to the Lanczos 
process in which case H m becomes tridiagonal symmetric. 

We can write the desired approximation to x = e A b as x m = p m (A)u or equivalently 
x m = V m y where y is an m-vector. In [4], the choice y = fle H,n ei with ft — IHh was 
suggested, leading to the following formula for arbitrary t, 

e tA b ~ (3V m c tHrn e\ ( 22 ) 

The quality of this approximation was also analyzed in [4] and the following result was 
shown. 

Theorem 4.1 Let A be any square matrix and let p = ||A|| 2 . Then the error of the approx- 
imation (22) is such that 

|| e tA b - pV m e tH ™exh < 2 /^“T^ ( 23 ) 

Experiments reported in [4], reveal that this approximation can be very accurate c\en for 
moderate values of the degree m. The theorem shows convergence of the approximation (22) 
for fixed /, as m increases to oo. However, note that the above approximation is exact when 

m = N, see [4]. 

Let us now substitute the expression (22) in (6). We obtain the approximation 

X m = V m (jf°° e THm {l3ex){l3c 1 ) T c TH ™dT S j V* = V m G m V*. ( 2 4) 
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Assuming that II m has eigenvalues with negative real parts, we note that. G m is the solution 
of the m x m Lyapunov equation 

H m G m + G m Hl + /? 2 e,e[ = 0 (25) 

In other words, modulo the polynomial approximation made on the exponential of /!, we 
have reduced the original problem into one of dimension m. This raises the question as to 
whether or not the process just described is mathematically equivalent to a projection-type 
method such as a Galerkin process. 

To see that this is the case we need to define the subspace of approximants and the 
inner product. The subspace of approximants will be the subspace Z m (A m ) as defined in 
Section 2. This is the same as the subspace of all matrices of size N of the form V m GV 
where V m is the orthogonal basis of K m as constructed from the Arnoldi process. Recall 
that from our definition V m is fixed and therefore the actual variable is the m x m matrix 
G. For the inner product we take the usual inner product in R N , i.e. , the inner product of 
two matrices taken as vectors of IV 2 elements. It is a simple exercise to show that this can 
also defined by 

< X, Y >= tr(XY T ) (26) 

The Galerkin condition defines an approximation X by stipulating that X belong to Z m 
and that the residual R(X) = AX + X A r + bb T be orthogonal to all of the subspace of 
approximants. This second condition gives, 

tr[Z R T (X )\ = 0, VZ e Z m (27) 

and therefore, 

0 - tr[V m GV^R(Xf} = tr[GV^R(X) T V m ) V G (28) 

Taking matrices G of the form G — = l,...,m leads immediately to 

V^R(X) T V m = 0. (29) 

Let us now substitute X = V m G m Vj t in (29). Remembering that V m is orthogonal and that 
we have b = j3v\, where V\ is the first column of V m , we get 

0 = VlR(X) T V m = V^(AV m G m \'Z + V m G m V^A T + bb T )V m 

= HmG m + G m Hl + p 2 c x e[ (30) 

which is exactly (25). We have therefore just proved the following result. 

Theorem 4.2 The Galerkin method applied to the Lyapunov equation (5) over the subspace 
Z m is mathematically equivalent to approximating the solution X by evaluating the integral 
(6) in which w(r) — e rA b is replaced by its approximation (22). 

This theorem can allow one to establish error bounds for the approximation provided by 
a Galerkin-type process onto the subspace Z m . 
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m 

IIAHIf 

Time (sec) 

5 

1.10 E-01 

0.18 

10 

5.40 E-06 

0.23 

15 

7.92 E-07 

0.35 

20 

1.92 E-07 

0.45 


Table 1 : Performance of the Krylov snbspace method. 

5 Numerical Experiments 

In this section we describe a few numerical experiments in order to test and compare the 
methods described in Sections 3 and 4. All the tests have been performed in double pre- 
cision arithmetic on an Ardent Titan super workstation. Our example is derived fiom the 
discretization of a partial differential equation of the form: 

^7 = A u + F(x,y)g(t) ( 3l ) 

ot 

in a rectangular domain, with Dirichlet boundary conditions. Here A = T ' s the 
Laplacean operator. If we discretize the rectangle using n x + 2 points in the x direction and 
n y + 2 points in the y direction, the above equations lead to a matrix problem of the form: 

ii = An + by (32) 

where A is square of dimension N = n x n y . In this experiment we took n x = 20 and n y = 40 
leading to a matrix of size 800. The corresponding matrix has a 1 -norm of 3,528.0. Wc 
have taken b to be simply t\ the first column of the identity. Tests with other choices for b 
showed similar results. First we would like to show the behavior of the residual a< hieved bj 
the Krylov subspace method described in Section 4, as the degree rn varies. Table 1 shows 
the scaled Frobenius norm of the residual, i.e., the quantity ||AX m + X m A T + bb T \[r, where 
\\Z\\f = (tr[Z T Z]/N) 1 / 2 . This is done for m = 5,10,15,20. 

We have used the Arnoldi process instead of the Lanczos algorithm on purpose, despite 
the fact that the matrix is symmetric. This is in order to give an idea of the increase in 
time in the more general nonsymmetric case. The table indicates that the accuracy of the 
Krylov subspace approximation to the Lyapunov equation is good for very small in and then 
improves slowly. The times reported in this table and in the next one are in seconds on the 
Titan and have been obtained using the -03 compiling option. 

Next we would like to test the methods based on numerical integration described in 
Section 3. Just as in the previous test wc show the residual norm and the time to compute 
the approximate solution for various choices of accuracy and step size. In all of the methods 
we used the same change of variable for scaling purposes as described at the end of Section 3: 
we scaled the variable t by 2.5/||/l||i. To approximate c AAt 'it>i we used the formula ( 20 ) of 
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Method 

m 

|| Res p 

Time 

Laguerre(9) 

9 

4.21 E-06 

0.57 

Laguerre(15) 

15 

7.08 E-08 

0.92 

NC-3 At = 0.3 

6 

3.47 E-04 

0.23 

NC-3 At = 0.1 

11 

1.17 E-04 

0.30 

NC-3 At = 0.05 

16 

8.53 E-05 

0.44 

NC-5 At = 0.40 

9 

1.59 E-04 

0.26 

NC-5 At = 0.20 

12 

5.97 E-05 

0.35 

NC-5 At = 0.20 

16 

1.45 E-05 

0.41 

NC-7 At = 0.5 

7 

3.76 E-04 

0.23 

NC-7 At = 0.25 

13 

4.96 E-05 

0.36 

NC-7 At = 0.20 

18 

9.60 E-06 

0.46 


Table 2: Behavior of the integration techniques. 

degree 5 for the Newton Cotes formulas and 10 for the Gauss-Laguerre formulas. In general 
the Newton-Cotes formulas do not perform as well as those based on Gauss-Laguerre. The 
disadvantage of Gauss-Laguerre formulas is that if their accuracy is not sufficient one must 
restart all over again since the roots of the Laguerre polynomials of different degrees are 
different. Not so with the Newton Cotes formulas. 


6 Conclusions 

We have proposed and tested two types of techniques for approximating solutions to large 
Lyapunov equations that arise in control problems. The attraction of the Krylov subspace 
method is its simplicity and its overall effectiveness. The Gauss-Laguerre formula may be 
particularly useful when only the product of X by a vector is desired and one cannot afford 
to store the vectors The use of Newton-Cotes formulas is not recommended without 

incorporating a proper change of variables to improve effectiveness of the integration scheme. 
Although we have relied on the integral formulation of the solution which requires t hat the 
eigenvalues of A all have negative real parts, the Galerkin approach may be applied to more 
general situations. The theory for this is not established however. Another case that we have 
not addressed is the general situation of equations of the form AX + XA T + B — 0, where 
D is may be sparse but necessarily of small rank. A completely different approach may be 
required in this situation, one that might exploit any possible sparsity in the solution. 
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